feat(read): implement read builtin with IFS field-splitting by matt-dz · Pull Request #211 · DataDog/rshell

matt-dz · 2026-05-01T17:29:24Z

What does this PR do?

Adds the read shell builtin to the restricted shell. read consumes a delimited chunk from stdin and assigns each IFS-split field to a shell variable (defaulting to REPLY when no name is given).

This is the first builtin in the codebase that mutates parent-shell state, so the change extends the builtin contract with two minimal capabilities on builtins.CallContext:

SetVar(name, value) error — routes through the runner's existing setVarErr, so MaxVarBytes (1 MiB) and MaxTotalVarsBytes (1 MiB) are enforced for free.
GetVar(name) (string, bool) — needed so read can consult IFS for field-splitting.

These two together form the minimal complete contract for any future state-aware builtin (export, unset, shift, …).

Motivation

read is a fundamental POSIX builtin. The canonical idiom while IFS= read -r line; do …; done < file is widely used in shell scripts. Until now the shell had no way to express it; read was explicitly blocked at dispatch time. Lifting that block required granting builtins a way to mutate parent-shell variables, which was a deliberate architectural decision discussed during implementation rather than a one-line capability add.

Surface

Supported flags (six + --help):

Flag	Behaviour
`-r`	Raw mode (no backslash interpretation)
`-p PROMPT`	Print PROMPT to stderr; only when stdin is a TTY (matches bash)
`-d DELIM`	Single-rune delimiter; `-d ''` means NUL
`-n NCHARS`	At most N characters (UTF-8 aware), capped at 1 MiB
`-N NBYTES`	Exactly N bytes, ignoring delimiters, capped at 1 MiB
`-t SECONDS`	Read deadline via `*os.File.SetReadDeadline`; `context.WithTimeout` fallback. Returns exit 142 on timeout (bash convention: 128 + SIGALRM)

Excluded from v1 with rationale in SHELL_FEATURES.md: -a (needs a parallel SetArrayVar contract addition; can follow later), -s/-u/-e/-i (interactive-terminal features that don't apply in a non-interactive shell).

Memory & DoS hardening

-n and -N are rejected at flag-parse time when N > 1 MiB or N < 0. The 1 MiB anchor matches the existing MaxLineBytes cap in cat / head and the runner's MaxVarBytes/MaxTotalVarsBytes.
readInput enforces a separate runtime byte ceiling (also 1 MiB) so a UTF-8-amplified -n cannot exceed the buffer cap.
Bytes are read one at a time without buffering so subsequent read calls on the same stdin observe unconsumed bytes (a bufio.Reader would swallow bytes past the delimiter — common bug in naive implementations).
Defenses layered: flag-time rejection → runtime byte cap → MaxVarBytes on the assigned value → MaxTotalVarsBytes on the aggregate variable store.

IFS field-splitting

Implements POSIX read semantics:

Whitespace IFS chars (space, tab, newline) coalesce; non-whitespace IFS chars introduce a separate (possibly empty) field per occurrence.
Leading and trailing IFS-whitespace is stripped from the input.
When the input has more fields than names, the last variable absorbs the remainder verbatim with only trailing IFS-whitespace stripped.
When fewer fields than names, missing variables are set to empty.

Testing

16 new YAML scenarios under tests/scenarios/cmd/read/:

basic/ (7) — single var, default REPLY, multi-var, last-absorbs, fewer-fields, redirect-from-file, EOF-returns-1.
flags/ (6) — raw vs non-raw, line continuation, delimiter, -n, -N.
ifs/ (3) — default IFS (space/tab), custom IFS (:), empty IFS (no stripping).
errors/ (4) — -n and -N cap rejection, invalid timeout, invalid identifier.
help/ (1) — --help exits 0 with usage.

The non-bash-comparison scenarios all pass locally. Bash byte-for-byte comparison (RSHELL_BASH_TEST=1) was not run locally — Docker daemon was not running. Worth verifying in CI.

Three pre-existing help scenarios were updated for the bumped builtin count (28 → 29). The obsolete shell/blocked_commands/.../read.yaml scenario was deleted.

Symbol allowlists in analysis/symbols_builtins.go updated: 10 new symbols added to the global ceiling (context.CancelFunc, context.DeadlineExceeded, fmt.Fprint, os.ErrDeadlineExceeded, os.File, os.ModeCharDevice, time.Now, unicode/utf8.{DecodeLastRuneInString,FullRune,RuneSelf}) plus a new "read": per-command sublist.

Reviewer notes

The two-field expansion of CallContext is the architecturally significant change. Future state-aware builtins should reuse SetVar/GetVar rather than reaching past them.
Non-raw backslash handling does require multi-rune lookahead (\<delim> is a line continuation, \<X> is just X). The readInput function handles this with a single helper closure for rune reads.
-p is silent in this codebase because stdin is always a pipe in scripted contexts. The TTY check is there for parity in case stdin is ever wired to a terminal by a calling embedder.
Per AGENTS.md, this PR is opened as a draft; please do not add the verified/allowed_symbols label automatically — that is reserved for human approval after reviewing the symbol-allowlist changes.

Checklist

Tests added (16 YAML scenarios)
Documentation updated (SHELL_FEATURES.md)
Symbol allowlists updated
make fmt clean, go vet clean, full test suite passes locally
Bash comparison verified locally (skipped — Docker not running)

🤖 Generated with Claude Code

Adds the `read` shell builtin, which consumes a delimited chunk from stdin and assigns the IFS-split fields to one or more shell variables (defaulting to REPLY when no name is given). This is the first builtin in the codebase that mutates parent-shell state, so the change extends the builtin contract with a minimal pair of capabilities. Contract change (builtins.CallContext): - SetVar(name, value) error: assign to a shell variable; routes through the runner's setVarErr so MaxVarBytes and MaxTotalVarsBytes are enforced. - GetVar(name) (string, bool): read a shell variable (used to consult IFS for field-splitting). Wired at both CallContext construction sites in interp/runner_exec.go. Builtin (builtins/read/read.go) supports: -r raw mode (no backslash interpretation) -p PROMPT print to stderr (only when stdin is a TTY, matches bash) -d DELIM single-rune line delimiter; -d '' uses NUL -n NCHARS read at most N characters (UTF-8 aware), capped at 1 MiB -N NBYTES read exactly N bytes ignoring delimiters, capped at 1 MiB -t SECONDS timeout via *os.File.SetReadDeadline (or context.WithTimeout fallback); returns 142 on timeout to match bash --help usage to stdout Excluded from v1: -a (would need SetArrayVar), -s/-u/-e/-i (interactive features that don't apply to a non-interactive shell). Documented in SHELL_FEATURES.md. Field-splitting follows POSIX read semantics: whitespace IFS chars coalesce, non-whitespace IFS chars introduce separate fields, leading and trailing IFS-whitespace is stripped, and the last variable absorbs the remainder when there are more fields than names. Bytes are read one at a time without buffering so subsequent reads from the same stdin observe unconsumed bytes (a buffered reader would swallow them past the delimiter). Tests: 16 new scenarios under tests/scenarios/cmd/read/ covering basic, flags, ifs, errors, and help. Three help scenarios bumped from 28 to 29 builtins; the obsolete read-is-blocked scenario removed. Symbol allowlists updated for the new package's imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Builtins must not call time.Now directly; the canonical pattern is callCtx.Now (script-start reference, captured once per Run for stable multi-builtin reasoning) for things like find -mtime / ls -l. For read -t SECONDS we want a true monotonic "from now" deadline, not a script-start reference (a 5s timeout 10s into the script must not fire immediately). callCtx.Now is therefore wrong here. The right answer is to read the deadline already computed by context.WithTimeout — that call is on the symbol allowlist, the stdlib does the time.Now lookup internally, and ctx.Deadline() lets us project that same instant onto *os.File.SetReadDeadline so kernel and context deadlines stay in sync. Removes time.Now from both the global builtin allowlist and the read per-command sublist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-01T17:41:22Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbf514b698

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Four bash-compatibility fixes flagged by codex: 1. Preserve unsplit input for REPLY and -N When no NAME is given, bash assigns the raw line to REPLY without IFS trimming or splitting. With -N, bash assigns the whole read to the first NAME and leaves remaining NAMEs empty (no IFS splitting). Both paths now skip splitIFS entirely. 2. Count -N input as characters, not raw bytes Bash's -N is character-based, not byte-based: read -N 1 over "éx" stores "é" (2 bytes), not the first byte of the UTF-8 sequence. Dropped the dedicated byte-mode branch and unified -N with the default rune+escape path; -N differs only in that it ignores the delimiter rune. Backslash handling in -N now matches bash too (escapes processed unless -r). 3. Permit values exactly at the read byte cap The cap check fired before the -n / delimiter checks, so a value of exactly MaxReadBytes was rejected on the next iteration even though the flag validation and MaxVarBytes both allow that size. The cap now fires only when an append would push past it, so a 1 MiB value/line followed by EOF or the delimiter returns successfully. 4. Handle a single trailing IFS separator `printf 'a:b:\n' | { IFS=: read a b; }` should give b="b" because the trailing colon represents an empty trailing field that bash silently drops. splitIFS now strips a lone trailing non-ws IFS character from the last absorbed field when it is the only such character in that field. Verified across the spectrum: a:b: → b, a:b:: → b::, a::b: → :b: (only one strip when there is exactly one trailing IFS char in the absorbed remainder). New scenarios: basic/reply_preserves_padding.yaml flags/N_counts_chars_utf8.yaml flags/N_applies_escape.yaml flags/N_skips_field_split.yaml ifs/trailing_single_ifs_stripped.yaml All 22 read scenarios pass; full suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The bash comparison test runs in debian:bookworm-slim's default C/POSIX locale, where bash's `read -N` counts bytes (so `read -N 1` over `éx` returns the first byte 0xC3 — an invalid UTF-8 fragment). rshell always counts UTF-8 characters regardless of locale, so the test assertion `[é]` matched bash on macOS (UTF-8 locale) but not in CI. Setting LC_ALL=C.UTF-8 for the bash side puts bash in UTF-8 mode, making the comparison meaningful: both shells now treat -N as character-based on UTF-8 input. Verified locally with `LC_ALL=C bash` (returns `[\xc3]`) vs `LC_ALL=C.UTF-8 bash` (returns `[é]`). The intentional divergence in C-locale environments is documented in the scenario comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-01T18:29:38Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 388464302b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Three bash-compatibility fixes from the latest codex review: 1. Preserve partial input on read timeouts When `-t` fires after some data has been read but before the delimiter, bash assigns the partial line and returns 142. Previous implementation returned 142 immediately and discarded the bytes. Restructured run() to track a `timedOut` flag and fall through to the assignment path when there is data, surfacing 142 only at the end. Empty result + timeout still returns 142 with no assignment. 2. Handle `-t 0` as a non-blocking poll Bash's `read -t 0` returns 0 if input is immediately available (data buffered or EOF), 142 otherwise — no waiting, no consumption. Previously `context.WithTimeout(ctx, 0)` produced an already-expired context that always reported timeout. Added pollAvailable() that sets an in-the-past read deadline on *os.File stdin (computed from c.Now to avoid a direct time.Now call) and probes for one byte. The byte is consumed on success — an intentional divergence from bash, documented inline; matching bash exactly would require a platform-specific select(2) call. 3. Use a real isatty check for the -p prompt `os.ModeCharDevice` matched any character device including /dev/null, so `read -p PROMPT x </dev/null` would emit the prompt; bash suppresses it. Replaced with golang.org/x/term.IsTerminal, which uses the platform's TIOCGETA / GetConsoleMode and returns false for /dev/null, pipes, and regular files. Symbol allowlist updates: + golang.org/x/term.IsTerminal (global ceiling + read sublist) + time.Hour (read sublist; in-the-past poll deadline) - os.ModeCharDevice (no longer needed; removed from global ceiling and read sublist). go.mod gains a direct dep on golang.org/x/term. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-13b845 # Conflicts: # SHELL_FEATURES.md # analysis/symbols_builtins.go # interp/register_builtins.go # tests/scenarios/cmd/help/restricted.yaml # tests/scenarios/cmd/help/restricted_all_flag.yaml # tests/scenarios/cmd/help/unrestricted.yaml # tests/scenarios/cmd/help/unrestricted_all_flag.yaml

matt-dz · 2026-05-04T14:08:16Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a82f8e2890

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Three bash-compatibility fixes from the latest codex review: 1. Defer invalid-name failure until assignment time For `printf 'a b c\n' | { read good 1bad mid; }` bash assigns good=a, prints an error for 1bad, leaves mid untouched, and returns 1. Previously rshell rejected the whole call before the read ran, so even good was left empty. The validation has been moved into the assignment loop: each name is checked just before its SetVar call, and the error message now matches bash's exact format (`read: \`1bad': not a valid identifier`). 2. Honour the last of -n / -N when both are specified bash applies last-set-wins: `read -n 1 -N 2` reads 2 chars, `read -N 2 -n 1` reads 1. The previous implementation always chose -N when present. pflag's `Changed()` only reports whether a flag was set, not relative parse order, so we install a custom `pflag.Value` (`orderedIntValue`) that records, on each Set(), a monotonic counter shared between the two flags. The handler picks the flag with the larger counter. 3. Don't consume input for `read -t 0` (Unix) bash's `read -t 0` is a non-consuming availability probe, so `read -t 0 && read line` over `printf "hello\n"` should leave `line=hello`. The previous implementation set a deadline in the past and read one byte, which leaked the byte on success. Added a platform-specific `pollInputNonConsuming` helper using `unix.Poll(fds, 0)` for Unix; Windows continues to fall back to the consume-based probe (documented as a follow-up). Symbol allowlist updates: + golang.org/x/sys/unix.{Poll,PollFd,POLLIN,POLLHUP} + strconv.Atoi / strconv.Itoa (used by orderedIntValue) New scenarios: errors/invalid_name_after_valid.yaml flags/n_N_last_wins.yaml Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-04T14:37:53Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88b60b03dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

When read hits EOF or times out before any data, bash assigns "" to every requested NAME (clearing any prior value) and then returns the non-zero exit code. The previous early-return on `line == ""` skipped the assignment loop, so `X=old; read X </dev/null` left X=old instead of clearing it. Removed the early return; the assignment loop now runs with the zero-valued `values` slice, which writes "" to each variable. The existing exit-code logic at the end of run() then surfaces 1 (EOF) or 142 (timeout) as appropriate. Verified bash matches: X=old; read X </dev/null → rc=1, X="" A=oldA B=oldB; read A B </dev/null → rc=1, A="", B="" good=oldgood mid=oldmid; read good 1bad mid </dev/null → rc=1, good="", mid=oldmid (loop stops at 1bad) { sleep 0.2; printf "x\n"; } | { X=old; read -t 0.05 X; } → rc=142, X="" New scenario at tests/scenarios/cmd/read/basic/eof_clears_existing_var.yaml locks in the multi-var EOF-clears-existing-value behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-04T15:17:10Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c197eb924

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Bash's line-continuation behaviour for backslash-delim only applies when the delimiter is newline. With a custom -d delimiter, `\<delim>` is an escape — the delimiter character is appended literally: `printf 'a\,b,c' | read -d , x` should assign `a,b`. The previous code treated any `\<delim>` as continuation, dropping both bytes and silently losing data for any escaped custom-delim input. Tightened the line-continuation check to require `delim == '\n'` so non-newline delimiters fall through to the regular escape branch (backslash dropped, next rune appended). Verified against bash: printf 'a\,b,c' | read -d , x → x="a,b" printf 'a\,b,c' | read -d , -r x → x="a\\" printf 'a\\\nb\n' | read x → x="ab" (continuation, unchanged) printf 'a\\bc\n' | read x → x="abc" (regular escape, unchanged) New scenario at tests/scenarios/cmd/read/flags/escape_custom_delim.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-04T15:52:25Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9e80ef816

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9e80ef816

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-04T15:57:49Z

💡 Codex Review

rshell/builtins/read/read.go

Lines 416 to 421 in c9e80ef

    
           readByte := func() (byte, error) { 
        
           	for { 
        
           		if err := ctx.Err(); err != nil { 
        
           			return 0, err 
        
           		} 
        
           		n, err := r.Read(one)

Honor run deadlines while read is blocked

When read is waiting on an idle stdin without its own -t flag, the whole-run context deadline is only checked before entering r.Read(one), so an interp.MaxExecutionTime, CLI --timeout, or parent context cancellation cannot interrupt a blocking terminal/pipe read. This newly allowed builtin can therefore hang an otherwise time-bounded rshell run until input arrives; propagate any existing ctx.Deadline() to pollable *os.File stdin (similar to the -t path) or otherwise arrange for the read to unblock on cancellation.

rshell/builtins/read/read.go

Lines 500 to 501 in c9e80ef

    
           if !ignoreDelim && nrn == delim && delim == '\n' { 
        
           	continue

Cap consumed bytes for continuations

For non-raw reads, an input made of repeated backslash-newline pairs never appends to buf and never increments runes, so the 1 MiB MaxReadBytes cap is not reached while read drains arbitrarily large input looking for a logical line terminator. In contexts where stdin is attacker-controlled, this bypasses the resource bound documented for the builtin and can burn unbounded CPU/I/O; count consumed bytes as well as buffered bytes before continuing here.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Two fixes from the latest codex pass: 1. Honour run deadlines while read is blocked (P1) When read waited on idle stdin without its own -t flag, an inherited context deadline (interp.MaxExecutionTime, CLI --timeout, parent ctx cancellation) was only checked between bytes, not inside the blocking r.Read syscall. A blocking terminal/pipe read could hang an otherwise time-bounded run. Lifted the SetReadDeadline propagation out of the `if -t` block so it fires whenever readCtx has any deadline. context.WithTimeout already returns the earlier of -t and the inherited deadline via ctx.Deadline(), so a single SetReadDeadline call covers both cases. Plumbed this for *os.File stdin (the runner's typical pipe shape); other readers still rely on the per-byte ctx.Err() check. 2. Cap consumed bytes for continuations (P2) Non-raw input made of repeated backslash-newline pairs never appended to buf and never incremented runes, so the 1 MiB MaxReadBytes cap (which only checked output buffer size) never fired while read drained the stream looking for a logical terminator. With attacker-controlled stdin this could burn unbounded CPU/I/O. Added a `consumed` counter that increments on every successful readByte. The cap is now checked in readByte itself, so any sequence of input — including discarded continuations or escape bytes — is bounded at MaxReadBytes total. The existing tryAppend size check is retained for defence in depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-04T16:11:04Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d19f22da19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Two fixes from the latest codex pass: 1. Allow max-sized records to reach EOF or delimiter (P2) The new consumed-bytes cap fired with `>=` MaxReadBytes, which meant a 1 MiB line followed by its newline errored out before read could observe the terminator. The cap should fire only when another byte would push us past it. Changed the check to `>` so reading exactly MaxReadBytes data bytes plus a 1-byte delimiter succeeds. Drain attacks still bounded — they error after at most one byte over the cap. 2. Reject non-finite / oversized -t timeouts (P2) `strconv.ParseFloat` accepts NaN, Inf, scientific notation, and values whose nanosecond conversion overflows time.Duration into a negative duration that fires immediately. Bash rejects all of these as "invalid timeout specification". Added parseReadTimeout that pre-validates the timeout string is non-empty and contains only `[0-9.]` with at most one '.', then ParseFloat for the numeric value, then NaN/Inf/overflow guards. The error message now matches bash: read: NaN: invalid timeout specification Verified bash rejects: NaN, Inf, 1e6, 1e20, -5, notanumber Verified bash accepts: 1, 1.5, 0.5, .5, 1., 100000000 Symbol allowlist: math.IsNaN, math.IsInf, math.MaxInt64 added to the read sublist. Updated tests/scenarios/cmd/read/errors/invalid_timeout.yaml to exercise NaN, Inf, scientific notation, and negative inputs and to assert the bash-style error message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-04T17:12:30Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a240eb697

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Two fixes from the latest codex pass: 1. Skip NUL bytes unless NUL is the delimiter (P2) Bash silently strips embedded NUL bytes from the assigned value regardless of mode. The only case where NUL matters is `-d ''` (NUL delimiter), and that path is handled by the delim check above the new NUL skip. Verified bash produces "abc" for `printf 'a\0b\0c\n' | read x` and "abcd" for `read -N 5 -d '' <<< 'ab\0cd'` — neither retains the NUL byte. Added an `if rn == 0 { continue }` after the delim check in readInput. The NUL byte is still counted toward `consumed` (so pathological NUL-stuffed input remains bounded) but is not appended and does not increment `runes`. 2. Preserve abort semantics on variable storage exhaustion (P2) AST-level assignment treats `errTotalVarStorageExceeded` from the writeEnv as a script-aborting resource-cap violation (interp/vars.go: sets `r.exit.exiting = true`). The previous `read` SetVar wrapper called `r.setVarErr` and returned the raw error to the builtin, which surfaced it as a regular status-1 failure — letting the script continue past the cap and weakening the DoS guard. Added a public sentinel `builtins.ErrVarStorageExceeded`. The SetVar wrapper now translates the private interp sentinel to the public one (preserving the original error message via wrapping). `read` checks `errors.Is(err, builtins.ErrVarStorageExceeded)` and returns `Result{Code: 1, Exiting: true}` so the runner stops the script, matching AST-assignment behaviour. New scenario: tests/scenarios/cmd/read/basic/strips_nul_bytes.yaml locks in the NUL-strip behaviour against bash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-05T19:10:47Z

@codex review this PR (iteration 7). P0/P1/P2/P3 prefix.

matt-dz · 2026-05-05T19:12:19Z

Iteration 7 self-review result: 0 findings. SUCCESS_COUNT=4/5.

matt-dz · 2026-05-05T19:12:31Z

@codex review this PR (iteration 8). P0/P1/P2/P3 prefix.

matt-dz · 2026-05-05T19:14:13Z

Iteration 8 self-review result: 0 findings. SUCCESS_COUNT=5/5 — loop reached clean terminator.

matt-dz · 2026-05-05T19:14:47Z

Review-Fix Loop Summary

PR: feat(read): implement read builtin with IFS field-splitting #211 (feat(read): implement read builtin with IFS field-splitting #211)
Iterations completed: 8
Final status: CLEAN
Stopped because: SUCCESS_COUNT reached 5 consecutive clean iterations (iter 4, 5, 6, 7, 8)

Iteration log

#	Unresolved threads	P0/P1/P2 findings	Fixes applied	CI status
1	3 (3 self)	4 (0/0/2 + 2 P3) self + 2 P2 codex	6 fixed	Passing
2	1 codex	1 P2 codex	1 fixed	Passing
3	1 codex	1 P2 codex	1 fixed	Passing
4	0	0	—	Passing
5	0	0	—	Passing
6	0	0	—	Passing
7	0	0	—	Passing
8	0	0	—	Passing

Findings fixed across the loop

Iteration 1 (self + codex):

P2 multi-byte -d delim divergence (rune→byte semantics, full readInput rewrite)
P2 missing scenario for multi-byte -d
P3 kernelCancel watchdog ran in pollMode (reordered)
P3 goroutine retry did not observe ctx.Done (added)
P2 codex: preserve bytes after invalid UTF-8 (obsoleted by byte-level rewrite)
P2 codex: dot-only timeout -t . / -t +. should be 0 (added short-circuit)

Iteration 2 (codex):

P2 codex: pollAvailable non-Unix fallback consumed input (removed consume-based fallback)

Iteration 3 (codex):

P2 codex: bare + timeout should be 0 (added empty-after-strip short-circuit)

Iterations 4–8: zero findings; codebase converged.

Final state

Unresolved threads: 0 (from matt-dz or chatgpt-codex-connector)
P0/P1/P2 findings: 0
CI: Passing on all real checks. The only failing check is Static Analysis Label Check, which is the manual-approval gate intentionally exempt per AGENTS.md (Never add the verified/allowed_symbols GitHub label).

Remaining issues

None — PR is ready for human review and merge.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac7919fbc0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…fields Codex P2: when the last assigned field's absorbed remainder contains internal IFS characters (ws or non-ws), bash preserves a trailing non-ws IFS character verbatim. The previous splitIFS rule "strip a lone trailing non-ws IFS char if exactly one such char in the absorbed string" was incorrect — `IFS=' ,' read a <<< 'b a,'` should produce a="b a," (the comma is preserved because the space is internal IFS-ws), but rshell stripped the comma yielding a="b a". Fix: tighten the strip rule. Strip the trailing non-ws IFS char only when the candidate string (s minus that char, then trim trailing IFS-ws) contains NO IFS characters at all — i.e., a single-field shape. Multiple internal IFS chars (any kind) means the absorbed value represents data from multiple fields and is preserved verbatim including the trailing delim. Verified empirically against bash 5.2: IFS=' ,' read a <<< 'b a,' → a="b a," (preserve) IFS=' ,' read a <<< 'b,a,' → a="b,a," (preserve) IFS=' ,' read a b <<< 'a b c,' → b="b c," (preserve) IFS=' ,' read a <<< 'a,' → a="a" (strip, single-field) IFS=' ,' read a <<< 'a ,' → a="a" (strip, single-field) IFS=':' read a <<< 'b a:' → a="b a" (strip, ' ' not IFS) Add ifs/preserve_trailing_delim_multifield.yaml covering the four distinct preserve/strip cases. The existing trailing_single_ifs_stripped and separator_plus_spaces scenarios continue to pass under the refined rule (their inputs are all single-field shape). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-13b845 # Conflicts: # analysis/symbols_builtins.go # builtins/builtins.go # interp/runner_exec.go

matt-dz · 2026-05-06T12:51:43Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b131c9228

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

matt-dz · 2026-05-06T17:32:53Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector · 2026-05-06T17:39:12Z

💡 Codex Review

rshell/builtins/read/read.go

Lines 508 to 509 in 6b131c9

    
           // Non-Unix fallback: report not-available rather than consume. 
        
           return builtins.Result{Code: 1}

Implement non-consuming Windows poll for read -t 0

On non-Unix platforms, pollInputNonConsuming reports unsupported and this fallback unconditionally returns status 1, so Windows rshell always says read -t 0 has no data for non-regular stdin even when a pipe already has buffered bytes; for example, printf 'x\n' | { read -t 0 && read v; } takes the false branch on Windows while it succeeds on bash/Linux. Since the project documents Windows support for builtins, readiness checks using read -t 0 need a non-consuming implementation such as PeekNamedPipe/console polling rather than an unconditional false negative.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex P2: on non-Unix platforms, pollAvailable's fallback returned Code 1 unconditionally, so Windows rshell always reported `read -t 0` as "no data" for non-regular stdin even when buffered bytes were sitting in a pipe. Scripts using `printf x | { read -t 0 && read v; }` took the false branch on Windows while succeeding on bash/Linux. Implement non-consuming readiness probing on Windows via a new internal helper package: - builtins/internal/winpoll/winpoll_windows.go uses GetFileType(handle) to dispatch by descriptor type: * FILE_TYPE_DISK → always available (regular files don't block). * FILE_TYPE_PIPE → PeekNamedPipe via kernel32.dll reports the buffered byte count without consuming any data; lpTotalBytesAvail > 0 ⇒ available. * FILE_TYPE_CHAR → GetNumberOfConsoleInputEvents reports queued console-input event count (close enough; counts non-character events too, but matches bash-on-WSL heuristics). * other → unsupported, caller falls back to Code 1. - builtins/internal/winpoll/winpoll_other.go is a no-op stub for non-Windows builds. - builtins/read/poll_other.go now delegates to winpoll.PollNonConsuming instead of returning unsupported unconditionally. The PeekNamedPipe call uses unsafe.Pointer to pass &avail through syscall.SyscallN — narrow exception isolated to builtins/internal/ (per the same convention as winnet for iphlpapi.dll). No pointer arithmetic occurs; the returned uint32 is consumed directly. Update internalPerPackageSymbols and internalAllowedSymbols to allowlist the new Windows-specific symbols (GetFileType, GetNumberOfConsoleInputEvents, FILE_TYPE_* constants, Handle) plus the existing syscall.MustLoadDLL and unsafe.Pointer. Verified: GOOS=windows go build ./... succeeds; analysis tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-06T18:27:20Z

[Claude Opus 4.7 (1M)] Addressed in 76cab76. Implemented non-consuming readiness probing on Windows via a new internal helper package builtins/internal/winpoll:

PeekNamedPipe via kernel32.dll for pipes — lpTotalBytesAvail > 0 ⇒ available, never consumes data.
GetNumberOfConsoleInputEvents for console/TTY (FILE_TYPE_CHAR).
Regular files (FILE_TYPE_DISK) report always-available (kernel reads don't block).
GetFileType dispatches by descriptor type; anything unrecognised falls through to the conservative "not available" answer.

builtins/read/poll_other.go (the !unix build) now delegates to winpoll.PollNonConsuming instead of returning unsupported unconditionally. The unsafe.Pointer use is isolated to builtins/internal/winpoll/winpoll_windows.go (single call site, no pointer arithmetic), following the same convention as builtins/internal/winnet for iphlpapi.dll.

Cross-build verified: GOOS=windows go build ./... succeeds; analysis tests pass; existing read tests on Linux/macOS unchanged.

matt-dz · 2026-05-06T18:31:39Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76cab766b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex P2: on Unix, the poll path treats POLLHUP (peer hung up, no buffered data) as available for `read -t 0` because bash 5.2 considers EOF as "I can complete the read without blocking" and returns rc=0. The Windows winpoll implementation only reported `avail > 0`, so `printf '' | { read -t 0 v; }` reported exit 1 on Windows while bash reports 0. Per Microsoft's PeekNamedPipe docs, when the writer end of the pipe is closed the call returns FALSE with GetLastError() == ERROR_BROKEN_PIPE. Use that as the EOF-ready signal: on a broken-pipe failure return (available=true, supported=true); any other failure returns (false, false) so the caller falls back to the conservative "not available". Update internalPerPackageSymbols["winpoll"] and internalAllowedSymbols to allowlist syscall.Errno (for the type assertion) and golang.org/x/sys/windows.ERROR_BROKEN_PIPE. Verified: GOOS=windows go build ./... succeeds; analysis tests and existing read tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz · 2026-05-06T18:45:23Z

@codex conduct a comprehensive security and code review

chatgpt-codex-connector · 2026-05-06T18:52:46Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-05-06T19:36:19Z

+
+	"github.com/DataDog/rshell/builtins"
+)
+


similar to other commands, can we add a description of the command + description per flag?

AlexandreYang · 2026-05-06T19:37:15Z

+				// refuse to run with "variable access is not
+				// available", which is the closest analogue to bash's
+				// "exec read fails because read is a builtin, not an
+				// executable on PATH" behaviour.


why this comment is needed?

The omission of SetVar/GetVar in this struct literal is load-bearing: it's how find -exec isolation works. read checks c.SetVar == nil and refuses to mutate the calling scope (matching bash, which exec-forks per match). The comment exists so a future maintainer doesn't "fix" the missing fields and silently break the isolation. Test: tests/scenarios/cmd/read/hardening/find_exec_isolation.yaml.

AlexandreYang · 2026-05-06T19:37:54Z

+				vr := r.writeEnv.Get(name)
+				return vr.Str, vr.IsSet()
+			},
+			Proc: r.proc,


why those changes to interp/runner_exec.go are needed?

read is the first builtin that mutates parent-shell state, so the runner needs to expose the minimum surface for it:

SetVar (assign vars in the calling scope — the whole point of read).

GetVar (consult IFS for field-splitting).

Translate the internal errTotalVarStorageExceeded sentinel to the public builtins.ErrVarStorageExceeded so the builtin can propagate Result.Exiting on resource-cap exhaustion (matches AST-level setVar semantics).

Post-call ctx.Err() check so a read -t that hits the run-level deadline surfaces context.DeadlineExceeded to cmd/rshell main instead of the builtin's 142.

@AlexandreYang

@AlexandreYang requested fuller per-command and per-flag descriptions to match the verbose-help builtins. - Replace the single-line package doc with a comprehensive comment block (Usage, Flags table, Field splitting, Backslash escapes, Exit codes, Unsupported flags) modeled after echo.go's package comment. - Enrich the `--help` output handler so `read --help` (and, via builtins/help, `help read`) produces a fuller help screen: brief description → Options (pflag-generated from the existing per-flag descriptions, unchanged) → Field-splitting paragraph → Backslash-escape paragraph → Exit-status table → Unsupported- flag note. No `Help` field is added on the Cmd struct: read registers its own --help handler, and TestHelpFieldOnlyOnNoFlagsCommands enforces that flag-having builtins surface help via --help so `help read` and `read --help` produce identical output. The per-flag descriptions on the pflag.StringP/BoolP/VarP calls were already present and remain unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-dz and others added 2 commits May 1, 2026 13:28

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go

matt-dz and others added 2 commits May 1, 2026 14:13

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

matt-dz and others added 2 commits May 4, 2026 08:58

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go Outdated

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

Comment thread builtins/read/read.go

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread builtins/read/read.go Outdated

matt-dz and others added 2 commits May 5, 2026 16:31

Merge remote-tracking branch 'origin/main' into claude/gifted-dewdney…

6b131c9

…-13b845 # Conflicts: # analysis/symbols_builtins.go # builtins/builtins.go # interp/runner_exec.go

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Comment thread builtins/read/read.go

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Comment thread builtins/internal/winpoll/winpoll_windows.go

matt-dz marked this pull request as ready for review May 6, 2026 19:32

matt-dz requested review from AlexandreYang, astuyve, julesmcrt, thieman and val06 as code owners May 6, 2026 19:32

matt-dz enabled auto-merge May 6, 2026 19:32

AlexandreYang reviewed May 6, 2026

View reviewed changes

Conversation

matt-dz commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Surface

Memory & DoS hardening

IFS field-splitting

Testing

Reviewer notes

Checklist

Uh oh!

matt-dz commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

matt-dz commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

matt-dz commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

💡 Codex Review

Uh oh!

matt-dz commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

matt-dz commented May 4, 2026

matt-dz commented May 1, 2026 •

edited

Loading